Database Design
I. Relational Database Design
Resources
Database design is the process of organizing and structuring data in a logical manner to efficiently store, manage, and retrieve information in a database system. It involves defining the database structure, relationships between data elements, and constraints to ensure data integrity and consistency.
Database design is important for several reasons:
- Efficiency: A well-designed database optimizes data storage and retrieval, improving system performance.
- Data integrity: Proper design helps maintain accurate and consistent data across the database.
- Scalability: Good design allows the database to grow and adapt to changing requirements over time.
- Reduced redundancy: It minimizes data duplication, saving storage space and reducing inconsistencies.
- Improved data security: Proper design facilitates the implementation of access controls and security measures.
1. Schema Design
A database schema is a formal description of the structure, organization, and relationships of data within a database. It serves as a blueprint for how data is stored and accessed. The schema defines:
- Tables and their attributes (columns)
- Relationships between tables
- Constraints and rules for data integrity
- Data types for each attribute
- Indexes and other performance optimization structures
There are three levels to schema design. These three levels form a progression from abstract to concrete, with each level adding more detail and specificity.
a. Conceptual
- This is the highest level of abstraction.
- It provides a big-picture view of what the database will contain.
- It focuses on identifying main entities, their attributes, and relationships.
- Usually represented using an Entity-Relationship Diagram (ERD).
- Does not include technical details.
b. Logical
- This is a more detailed representation of the database structure.
- It translates the conceptual model into a specific data model (e.g., relational, object-oriented).
- Defines tables, columns, relationships, and constraints.
- Still independent of any specific database management system (DBMS).
Look up Tables
A lookup table is a data structure used to map, reference, or convert data efficiently. In database design and computer science, lookup tables serve various purposes:
- speeding up data retrieval
- standardizing information
- simplifying data relationships
- etc.
Example: A typical example is a table that maps status codes to their descriptions. For instance, in a customer order system:
1
→Pending
2
→Shipped
3
→Delivered
This allows the system to store the numeric code in the order table, saving space and standardizing data while the lookup table provides the descriptive text.
CREATE TABLE status_codes (
status_id INT PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO status_codes (status_id, description)
VALUES
(1, 'Pending'),
(2, 'Shipped'),
(3, 'Delivered');
status_id | description |
---|---|
1 | Pending |
2 | Shipped |
3 | Delivered |
product_id | product_name | product_status |
---|---|---|
1 | Latop | 1 |
2 | Sofa | 1 |
3 | TV | 3 |
4 | Smartphone | 2 |
By storing product_status
as a numeric code and linking it to the status_codes
lookup table, you achieve efficient data storage and maintain consistency.
When you need to update the description of a status, you only need to modify the status_codes
table, and the products
table will automatically reflect the updated description through joins.
c. Physical
- This is the lowest level, describing how data is actually stored.
- It includes specific details for implementation in a particular DBMS.
- Defines storage structures, file organizations, indexes, access methods, and security measures.
- Optimized for performance on the chosen DBMS platform.
2. Data Integrity
Data integrity refers to the overall accuracy, completeness, consistency, and reliability of data throughout its lifecycle in a database system. It's a crucial aspect of database design and management that ensures data remains valid, trustworthy, and usable over time.
There are three types of data integrity:
1. Entity Integrity
- Ensures that each row in a table is a unique entity.
- Primarily enforced through primary keys.
- Prevents duplicate or null entries in columns designated as primary keys.
- Example: In a student database, each student record must have a unique student ID.
2. Referential Integrity
- Maintains consistent relationships between tables.
- Enforced through foreign keys.
- Ensures that values in a foreign key column correspond to existing values in the referenced table's primary key.
- Prevents orphaned records (records with foreign key values that don't exist in the related table).
- Example: In an order system, each order must be associated with a valid customer ID that exists in the customer table.
3. Domain Integrity
- Ensures that all data in a column falls within a defined domain or set of acceptable values.
- Enforced through various constraints like data types, check constraints, default values, and rules.
- Maintains the accuracy and validity of data within each field.
- Example: Ensuring that an 'age' column only accepts positive integers, or that a 'grade' column only accepts values A, B, C, D, or F.
CREATE TABLE students (
student_id INT PRIMARY KEY,
name VARCHAR(100),
age INT CHECK (age > 0), -- Ensures age is positive
grade CHAR(1) CHECK (grade IN ('A', 'B', 'C', 'D', 'F')) -- Ensures grade is within allowed values
);
II. Database Relationship
In a database, relationships define how tables are connected and how data is associated between them.
Parent - Child Tables
Parent and child tables are terms used in relational database design to describe the relationship between two tables connected by a foreign key.
- Parent Table: A parent table, also known as a referenced table, is the table that contains the primary key which is referenced by a foreign key in another table.
- Child Table: A child table, also known as a referencing table, is the table that contains the foreign key which references the primary key of the parent table.
The relationship between parent and child tables is typically a one-to-many relationship, where one record in the parent table can be associated with multiple records in the child table.
CREATE TABLE Authors (
author_id INT PRIMARY KEY,
author_name VARCHAR(100)
);
CREATE TABLE Books (
book_id INT PRIMARY KEY,
title VARCHAR(200),
author_id INT,
FOREIGN KEY (author_id) REFERENCES Authors(author_id)
);
Joiner Table
A joiner table (junction, bridge table) is a table in a relational database used to connect two or more tables, typically to resolve many-to-many relationships. Its primary key is usually a composite key composed of foreign keys referencing the primary keys of the tables it's connecting.
Example:
We have the employees
and clients
tables. To express the many-to-many relationship where an employee can work with multiple clients and a client can be served by multiple employees, we create a joiner table called works_with
.
+--------+-----------+-------------+
| emp_id | client_id | total_sales |
+--------+-----------+-------------+
| 107 | 400 | 55,000 |
| 101 | 400 | 267,000 |
| 101 | 404 | 22,500 |
+--------+-----------+-------------+
The joiner table works_with
has a composite primary key consisting of emp_id
and client_id
. This combination uniquely identifies each employee-client relationship.
emp_id
is a foreign key referencing theemployees
table.client_id
is likely a foreign key referencing theclients
table.
There are three primary types of relationships:
1. One-to-One (1:1)
In a one-to-one relationship, each record in one table corresponds to exactly one record in another table, and vice versa.
- Example: Table:
users
anduserProfiles
- Explanation: Each user has one unique profile, and each profile is associated with one user. The relationship between the tables
users
anduserProfiles
is one-to-one.
- Explanation: Each user has one unique profile, and each profile is associated with one user. The relationship between the tables
Designing One-to-One Relationships
Implementation: Usually, you'd have a foreign key in one table referencing the primary key in the other. This foreign key should have a unique constraint.
-- Example Schema
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(50) UNIQUE,
email VARCHAR(100) UNIQUE,
password_hash VARCHAR(255)
);
CREATE TABLE userProfiles (
profile_id INT PRIMARY KEY,
user_id INT UNIQUE,
full_name VARCHAR(100),
bio TEXT,
date_of_birth DATE,
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
2. One-to-Many (1:M)
In a one-to-many relationship, a record in one table can have multiple associated records in another table, but each record in the second table is associated with only one record in the first table.
- Example: Tables:
users
andcards
- Explanation: A single user can have many cards, but each card belongs to only one user. The relationship between
users
andcards
is one-to-many.
- Explanation: A single user can have many cards, but each card belongs to only one user. The relationship between
Designing One-to-Many Relationships
Implementation: Add a foreign key in the "many" table that references the primary key of the "one" table.
-- Example Schema
CREATE TABLE users (
user_id INT PRIMARY KEY,
user_name VARCHAR(100)
);
CREATE TABLE cards (
card_id INT PRIMARY KEY,
card_number VARCHAR(10),
user_id INT,
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
3. Many-to-Many (M:M)
In a many-to-many relationship, records in one table can be associated with multiple records in another table, and vice versa. This relationship is usually implemented using a joiner (or junction/bridge) table.
- Example: Tables:
students
andcourses
- Explanation: A student can enroll in multiple courses, and a course can have many students. To manage this relationship, a joiner table,
Enrollment
, is used to linkstudents
andcourses
. Theenrollment
table typically contains foreign keys that reference the primary keys of thestudents
andcourses
tables.
- Explanation: A student can enroll in multiple courses, and a course can have many students. To manage this relationship, a joiner table,
Designing Many-to-Many Relationships
Implementation: Create a joiner table with foreign keys to both main tables.
CREATE TABLE students (
student_id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE courses (
course_id INT PRIMARY KEY,
course_name VARCHAR(100)
);
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrollment_date DATE,
PRIMARY KEY (student_id, course_id), -- composite key
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
4. Self-Referencing Relationship
In a self-referencing relationship, a table has a foreign key that references its own primary key.
This allows you to model hierarchical relationships, such as an employee reporting to a manager who is also an employee.
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES Employees(emp_id)
);
III. Database Normalization
Database normalization is the process of organizing the structure of a relational database to minimize redundancy and improve data integrity. The goal is to design a database that is both efficient and easy to maintain, by dividing large tables into smaller, related tables and defining relationships between them.
Normalization is typically achieved by following a series of steps called normal forms. Each normal form addresses specific types of redundancy and dependency.
1. First Normal Form (1NF)
Ensures that the table is organized into rows and columns where each column contains atomic (indivisible) values, and each column contains only one type of data.
Example: A table with a column that stores multiple phone numbers in a single field should be split so that each phone number is stored in a separate row or column.
Original table:
CustomerID | Name | Phone Numbers |
---|---|---|
1 | Alice | 123-456-7890, 987-654-3210 |
→ a 1NF compliant table: |
CustomerID | Name | Phone Numbers |
---|---|---|
1 | Alice | 123-456-7890 |
1 | Alice | 987-654-3210 |
2. Second Normal Form (2NF)
Builds on 1NF by ensuring that all non-key attributes are fully functionally dependent on the primary key. This means that there should be no partial dependency of any column on the primary key.
A partial dependency occurs when a non-key attribute depends on only part of a composite primary key.
Example: In a table with a composite primary key (e.g., a table tracking student enrollments with student_id
and course_id
as keys), all other columns must depend on the full primary key, not just a part of it.
Original table:
Student_ID | Course_ID | Course_Name | Instructor |
---|---|---|---|
1 | C1 | Math | Smith |
1 | C2 | Physics | Johnson |
2 | C1 | Math | Smith |
→ a 2NF compliant table: split into 2 tables Enrollments and Courses |
Student_ID | Course_ID |
---|---|
1 | C1 |
1 | C2 |
2 | C1 |
Course_ID | Course_Name | Instructor |
---|---|---|
C1 | Math | Smith |
C2 | Physics | Johnson |
C1 | Math | Smith |
3. Third Normal Form (3NF)
Further refines 2NF by ensuring that all non-key attributes are not only dependent on the primary key but are also non-transitively dependent. This means no non-key attribute should depend on another non-key attribute.
A transitive dependency occurs when a non-key attribute depends on another non-key attribute, rather than depending directly on the primary key.
Example: Consider a table OrderDetails
with the following attributes:
OrderID | ProductID | ProductName | Category | CategoryManager |
---|---|---|---|---|
1001 | P101 | Laptop | Electronics | John Smith |
1002 | P102 | Smartphone | Electronics | John Smith |
1003 | P201 | Desk Chair | Furniture | Mary Johnson |
1004 | P101 | Laptop | Electronics | John Smith |
In this table:
OrderID
andProductID
form the composite primary key.ProductName
depends onProductID
.Category
depends onProductID
.CategoryManager
depends onCategory
, not directly on the primary key.
This violates 3NF because CategoryManager
is transitively dependent on the primary key through Category
.
To comply with 3NF, we should split this into two tables: OrderDetails
and Categories
.
OrderID | ProductID | ProductName | Category |
---|---|---|---|
1001 | P101 | Laptop | Electronics |
1002 | P102 | Smartphone | Electronics |
1003 | P201 | Desk Chair | Furniture |
1004 | P101 | Laptop | Electronics |
Category | CategoryManager |
---|---|
Electronics | John Smith |
Furniture | Mary Johnson |
Now:
OrderDetails
contains only attributes that are fully dependent on the primary key.Categories
separates theCategoryManager
, which depends onCategory
, into its own table.